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Abstract 

In this paper we raise the question of how to compress sparse graphs. By introducing the idea 
of redundancy, we find a way to measure the overlap of neighbors between nodes in networks. We 
exploit symmetry and information by making use of the overlap in neighbors and analyzing how 
information is reduced by shrinking the network and using the specific data structure we created, 
we generalize the problem of compression as an optimization problem on the possible choices of 
orbits. To find a reasonably good solution to this problem we use a greedy algorithm to determine 
the orbit of symmetry identifications, to achieve compression. Some example implementations of 
our algorithm are illustrated and analyzed. 



1 



I. INTRODUCTION 



Complex networks have been studied extensively in recent years in the fields of mathe- 
matics, physics, computer science, biology, sociology, etc . Various networks are 
used to model and analyze real world objects and their interactions with each other. For 
example, in sociology, airports and airflights that connect them can be represented by a 
network [12j]; in biology, yeast reactions is also modeled by network 9|; etc. The mathemat- 
ical terminology for a network is conveniently described in the language of graph theory. A 
common encoding of graphs uses an adjacency matrix, or an edge list, when the adjacency 
matrix is sparse 5j. However, even for a large network the edge list contains a large infor- 
mation storage. In the case that some important network is transfered frequently between 
computers, it will save time and cost if there is a scheme to efficiently encode, and therefore 
compress the network first. Fundamentally we find it a relevant issue to ask how much 
information is necessary to present a given network, and how symmetry can be exploited to 
this end. 

In this paper we will demonstrate one way to reduce the information storage of a network 
by using the idea that habitually graphs have many nodes that share many common neigh- 
bors. So instead of recording all the links we could rather just store some of them and the 
difference between neighbors. The ideal compression ratio using this scheme will be rj = 
where < k > is the average degree of the network, compared to the standard compression 
using Yale Sparse Matrix Format {(], 3] which gives T]y = § + In practice this ratio is 
not attainable but the real compression ratio is still better than using YSMF as shown by 
our results. 

A graph G = (V, E) is a set of vertices (or nodes) V = {vi, v 2 , v^} together with edges 
(or links) E = {(vi,Vj)} which are the connected pairs. Graphs are often used to model 
networks. It is sometimes convenient to call the vertices that connect to a vertex i in a 
graph to be the neighbors of i. We will only consider undirected and unweighted graph in 
this paper. 

A drawing as in Figure Q] allows us to directly visualize the graph (i.e. the nodes and 
the connections between them) , but a truism that anyone who works with real world graphs 
from real data knows is that commonly those graphs are so large that even a drawing will 
not give any insight. Visualizing structure in graphs of such sizes (N > 100 to 1000) begs 
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Figure 1: A drawing of a planar embedding of an example graph. 



for some computer assistance. 

An Adjacency matrix is a common, although inefficient data representation of a graph. 
The adjacency matrix Aq of a graph G = (V, E) is a, N x N square matrix where N is the 
number of vertices of the graph and the entries djj of Aq are defined by: 



H,3 



1 if node i and node j are connected 



djj = else 



(1) 



For example, the adjacency matrix Aq for the graph in Figure [T] is 



.4 



G 



10 
10 11 
10 1 
110 



(2) 



However, in the case that the number of edges in a graph are so few that the corresponding 
adjacency matrix is sparse, the edge list will be used instead. The edge list is a list of all the 
pairs of nodes that form edges in a graph. It is essentially the same as the edge set E for a 
graph G = (V, E). Using edge list Eq to represent the same graph as above we will have: 



E G = {{1,2}, {2, 3}, {3, 4}, {2, 4}}. 



(3) 



Note here that in the edge list we actually record the label of nodes for each edge in the 
graph, so for undirected graph, we can exchange the order for each pair of nodes. 

We will only consider sparse simple graphs, whose adjacency matrices will thus be binary 
sparse matrices, and the standard information storage for such graphs or matrices will be 
the information units that are needed for the corresponding edge list (or two dimensional 
arrays) . 
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We now sharpen the definition for the unit of information in our context. From the 
perspective of information theory, a message which contains iV different symbols will require 
log2N bits for each symbol, without any further coding scheme. The edge list representation 
is one example of a text file which contains N different symbols (often represented by natural 
numbers from 1 to N) for a graph containing N vertices. Note that the unit of information 
depends only on the number of symbols that appear in the message, i.e. the number of 
vertices in a graph, so for any given graph this will be a fixed number. Thus, when we 
restrict the disscussion to any particular graph, it is convenient to assume that each pair of 
labels in the edge list requires one information unit without making explicit what is the size 
of that unit. For example, the above graph requires 4 information units. In this paper we 
will focus on how to represent the same graph using fewer information units than its original 
representation. 

II. A MOTIVATING EXAMPLE AND THE IDEA OF REDUNDANCY 

As a motivating example, let us consider the following graph and its edge list. 




Figure 2: An extreme example which shows similarity between vertices. 

Note that here the neighbors of node 1 are almost the same as those of node 2. The edge 
list Eq for this graph will be: 

E = {{1,3}, {1,4}, {1,5}, {1,6}, {1,7}, {1,8}, {1,9}, 

{2, 4}, {2, 5}, {2, 6}, {2, 7}, {2, 8}, {2, 9}, {2, 10}}. (4) 

This requires 14 information units for the edge list. However, if we look back to the 
graph, we note that in this graph there are many common neighbors between node 1 and 
node 2, so there is a great deal of information redundancy. Considering the subgraphs, the 
neighbors of node 1 are almost the same as the neighbors of node 2, except that node 3 links 
to 1, but not 2, while node 10 links to 2, but not 1. 
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Figure 3: Similar subgraphs of the original graph. Here the subgraph containing node 1 (on the 
left) is very similar to the one dominated by node 2 (on the right). 

Taking the redundancy into account, we generate a new way to describe the same graph, 
exploiting the graphs. In the graph of Figure [2, we see that the subgraph including vertices 
1,3,4,5,6,7,8,9 is very similar to the subgraph including vertices 2,4,5,6,7,8,9,10, see 
Figure [31 We exploit this redundancy in our coding. 

We store the subgraph which only consists of node 1, and all its neighbors. Then, we add 
just two more parameters, 

a = (1,2) (5) 

and 

£ = {-3,10} (6) 

that allows us to reconstruct the original graph. Here the ordered pair a = (1, 2) tells us 
that in order to reconstruct the original graph we need to first copy node 1 to node 2. By 
copy, we mean the addition of a new node into the exsiting graph with label 2, and then 
linking all the neighbors of node 1 to the new node 2. 




Figure 4: Construct from the subgraph and parameter a = (1, 2). 'Copy' from node 1 to node 2. 

The set f3 = {—3, 10} tells us that we should then delete the link that connects the new 
node 2 and 3 and add a new link between 2 and 10. 
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Figure 5: Add and delete links according to (5 = {—3, 10}. 

After all these operations we see that we successfully reconstruct the graph with fewer 
information units, in this case, nearly half as many as the original edge list. So instead of 
equation (jlj), we may use the edge list of the subgraph 

E SG = {{1,3}, {1,4}, {1,5}, {1,6}, {1,7}, {1,8}, {1,9}} (7) 

as well as two sets 




Figure 6: Reconstruction of the original graph using a subgraph and the parameters a and (3. 

The above example suggests that by exploiting symmetry of the graph, we might be able 
to reduce the information storage for certain graphs by using a small subgraph as well as a 
and P as defined above. 
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However, there remains the question of how to choose the pair of vertices so that we 
actually reduce the information, and which is the best possible pair? It is important to 
answer these questions since most of the graphs are so large that we never will be able to 
see the symmetry just by inspection as we did for the above toy example. 

In the following we answer the first question, and partly the second, by using a greedy 
algorithm. In section 3 we will define information redundancy for a binary sparse matrix and 
show that it reveals the neighbor similarity between vertices in a graph which is represented 
by its corresponding adjacency matrix. Then in section 4 we will give a detailed description 
of our algorithm which allows us to implement our main idea. Then in section 5 we will 
show some examples of these applications followed by discussion in section 6. 



III. INFORMATION REDUNDANCY AND COMPRESSION OF SPARSE MA- 
TRICES 

A. How to Choose Pairs of Vertices to Reduce Information 

The graphs we seek to compress are typically represented by large sparse adjacency 
matrices. An edge-list is a specific data structure for representing such matrices, to reduce 
information storage. We will consider the edge-list form to be the standard way of storing 
sparse matrices, which requires M units of information for a graph with M edges. There are 
approaches of compressing sparse matrices, among which the most general is the Yale Sparse 



Matrix Format 



7J, which does not make any assumption on the structure of the matrix 
and only requires \{M + N) units of information. There are other approaches, such as 8J 
which emphasize not only the storage but also the cost for data access time. We will focus 
on the data storage, so the Yale Format will be considered as a basic benchmark approach 
for compression of a sparse matrix, to which we will compare our results. The Yale format 
yields the compression ratio: 

M + N 11 
7]Y = ^AT = 2 + Tk^ (9) 

where < k >= is the average degree of the graph. 

We will show our approach of compressing the sparse matrices by first illustrating how 

the redundancy of a binary sparse matrix will be defined regarding to our specific operation 

on the matrix. 
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Generally, the adjacency matrix is a binary sparse matrix, A = {a^} where equals 
or 1 indicating the connectivity between node i and j. For a simple graph consisting of M 
edges this matrix has 2M nonzero entries, but since it is symmetric only half of them are 
necessary to represent the graph, which yields M units of information for the edge-list. 

Now, if two nodes i and j in the graph share a lot of similar neighbors, in the adjacency 
matrix row i and row j will have a lot of common column entries, and likewise for column i 
and column j (due to the symmetry of the matrix). 

Suppose that we apply the operation to the graph, mentioned in the last section, by 
choosing a = and the corresponding j3, we will not need row j and column j in the 
matrix, to represent the graph. The number of nonzero entries in row j and column j is 2k j 
where kj is the degree of node j in the graph. By doing that, the number of nonzero entries 
in the new adjacency matrix becomes 2M — 2kj, which requires M — kj units of information. 
However, the extra information we have to record is encoded in a and (3. a always has two 
entries, which requires 1 unit of information, and the units of information for f3 depend on 
the number of different neighbors between node i and node j. If i and j have A^ different 
neighbors, the size of (3 will be 

101 = Ay, (10) 

and the units of information for (3 will thus be |Ay. Taking both the reduction of the 
matrix and the extra information into account, the actual information it requires after the 
operation is 

M-kj + l + \^ — M — (kj — 1 — l -^j). (11) 
This is true for % different from j. We could extend the operation to allow 

a = (i,i), (12) 

meaning a self-match, then we will put all the neighbors of i into the corresponding set f3, 
and then delete these links associated with i. Then by a similar argument we find that after 
this operation we need 

M - h + 1 + h t = M - (h t - 1) (13) 

units of information using the new format. 

Note that here we need to clarify exactly the meaning of different neighbors since in the 
case that i and j are connected i is a neighbor of j but j is not, and likewise for j. However, 
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this extra information can be simply encoded in a by making the following rule: a = 
means when we reconstruct we do not connect i and j and a — means we connect i 

and j when we reconstruct. Then we can write = \\A(i, :) — A(J, :)||i — 2ajj. 
From the above discussion we see that if we define 



:h - 1 



(14) 



then by choosing a = measures exactly the amount of information it reduces. We 

call the information redundancy between node i and j. Note here that in general this 
redundancy is not symmetric in i and j, since for any pair of nodes A^- is symmetric but 
the degree of these two nodes can be different, and deleting the node with higher degree will 
always reduce more units of information compared to deleting the lower degree node. 

We form the redundancy matrix R by setting the entry in row % and columnn j to be 
Tij. We perform the shrinking operation for the pair with maximum r^, thus saving the 
maximum amount of information. 

For example, again using the graph from section 2, the adjacency matrix is: 



11111110 
1 1 1 1 1 1 1 
1000000000 
1100000000 
1100000000 
1100000000 
00000000 



00000000 
00000000 
00000000 



(15) 
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and the corresponding redundancy matrix is: 



2.5 


5 


-3 


-2.5 


-2.5 


-2.5 


-2.5 


-2.5 


-2.5 


-4 


5 


2.5 


-4 


-2.5 


-2.5 


-2.5 


-2.5 


-2.5 


-2.5 


-3 


3 


2 


-0.5 


0.5 


0.5 


0.5 


0.5 


0.5 


0.5 


-1 


2.5 


2.5 


-0.5 





1 


1 


1 


1 


1 


-0.5 


2.5 


2.5 


-0.5 


1 





1 


1 


1 


1 


-0.5 


2.5 


2.5 


-0.5 


1 


1 





1 


1 


1 


-0.5 


2.5 


2.5 


-0.5 


1 


1 


1 





1 


1 


-0.5 


2.5 


2.5 


-0.5 


1 


1 


1 


1 





1 


-0.5 


2.5 


2.5 


-0.5 


1 


1 


1 


1 


1 





-0.5 


2 


3 


-1 


0.5 


0.5 


0.5 


0.5 


0.5 


0.5 


-0.5 



(16) 



The maximum entry in R is r\i = ri\ = 5, indicating that either choice of a = (1, 2) or 
a = (2, 1) will give the maximum information reduction, and the corresponding (3 can be 
obtained by recording the column entries in row 1 and row 2 according to our rule. 

In the above discussion we only consider a one step shrinking operation on the graph 
and find out the direct relationship between the maximum information reduction and the 
redanduncy matrix. But we know that after deleting one node the resulting graph is still 
sparse and so could be compressed further by our scheme. The question is then how to 
successively choose a and (5 to obtain the best overall compression. 

B. On Greedy Optimization of The a, (3, Orbit 

Let a t = {it, it) denote the operation at step t, t = 1,2, ...,T (here the sign for j t would 
not affect our analysis so by convience we just write j t ). In order to analyze the multi- 
step effect, we first consider how the adjacency matrix A is affected by the orbit {at}. Let 
Aq = A be the original adjacency matrix. Let A t be the corresponding adjacency matrix 
after applying a t and the entries in it be A t (i,j). On deleting node j t we actually set row 
and column j t to be zero in A t _\ and all the other entries are unchanged, to obtain the new 
matrix A t , i.e. 

MhJ) = A t-i(i,j) if i,j i^jt 
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So by induction we see that 



= if i = j t or j = j t . 



(17) 



A t (iJ) = A (i,j) if ij £ {ji,-.Jt} 

A t (i,j) = if i e {ji, ...J t } or j e O'i, ...J t }- (18) 

Then we analyze how the redundancy matrix R changes. Use Rt to represent the redun- 
dancy matrix, kt(i) the degree of node i, and A t (iJ) the number of different neighbors of 
node i and j, associated with the graph of A t . Since our goal is to achieve compression, once 
a node is deleted in the graph it is useless for future operations. So we will set R t {iJ) = 
if i or j has been deleted before, i.e. 

R t (iJ) = if i e -Jt} or j G Oi, -Jt}- (19) 

Now for those i and j that have not been deleted, i.e. ij ...Jt}, by equation [TH 

we see that R t {iJ) = h(j) — 1 — \/S. t {iJ) for i ^ j and Rt(i,i) = \k t {i) — 1. Since A t is 
obtained by deleting row and columnn j t in A t -i, the degree of each node changes according 
to: 

k t {i) = kt-i(i) - A t -i(iJt) (20) 

and Ajj changes according to 

A t (iJ) = Ai_x(z, j) - \A-i(iJt) - A t -x(jJt)\ (21) 
Thus, we conclude that for i ^ j 

Rt(iJ) = h-i(j) ~ A t -i(jJt) - 1 - -[At-x(iJ) - \A t -t{iJt) - At^Qjt)]} 
= k t -i(j) - 1 - ~A t _i(i, j) - At-^jJt) + -{At-^iJt) - A^ x {jJ t )\ 
= Rt-xiiJ) + [\\At-x{i, j t ) - At-x{jJt)\ - A t -i(jJt)] (22) 

and for i — j 

Rt{i,i) = -h(i) - 1 

= -{kt-x^-At^iJt))-! 

= R t „ 1 (i,i)--At- 1 (iJt). (23) 
11 



By induction, we obtain that for i ^ j: 

t x 

R t {i,j) = Ro(iJ) + Y^^-i^ir) - A T ^{j,j T )\ - Ar-iUJr)] (24) 

r=l 

and for i = j: 

- 1 

Rt{i,i) = R (i,i) + Yj--A r - 1 (i,j T )]. (25) 

r=l 

By use oft the fact that i,j ^ {ji, j t }, by equation [T71 we can simplify the above two 
expressions to yield, 

t 

'2 



Rt(i,j) = Ro(i,j) + Y^lMhir) ~ MjJr)\ ~ A (j,j t )} if i ^ j 

r=l 1 

1 1 

Rt(i,i) = i2o(i,i)+J][--Ao(i,jV)]. (26) 

T = l 

Note that if we choose a pair (it,jt) at step t, the information we save is measured 
by Rt-i{i t ,jt)- Thus, for any orbit {a t = {i t ,jt)}J=i satisfying i t ,j t £ {h, jt-i} for 
t = 2,3,...,T (we call such an orbit a natural orbit), the total information reduction (or 
information saving) will be: 

T 

s({a t }J =1 ) = ^Rt^itJt) 
t=i 

T 

= Y,[Mh,jt) + c(i t ,jt,t)] (27) 
t=i 

where c is defined by: 

- 1 

c(i,j,t) = ^2[-\Ao(i,jr) - A (j,j T )\ - A (j,j t )] Hi^j 

T=l 1 

- 1 

c{i,i,t) = ^[--A (z,i r )]- (28) 

r=l 

So the compression problem can be stated as: 

Find max s({a t }J =1 ). (29) 

One more thing to mention is that the length of the orbit, T, is also a variable, which 
could not be larger than N since there are only N nodes in the graph and it is meaningless 
to delete an 'empty' node which does not even exist. 
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IV. GREEDY ALGORITHM FOR COMPRESSION 



From the previous section we see that for a given adjacency matrix, the final compression 
ratio depends on the orbit {a t }f = i we choose, and the compression problem becomes an 
optimization problem. However, to find the maximum of s and the corresponding best orbit 
is not trivial. One reason is that the number of natural orbits is of order N\, which makes 
it impractical to test and try for all possible orbits. Another reason which is crucial here is 
that for any given orbit of length T, evaluating s costs 0(T 2 ) operations, making it hard 
to find an appropriate scheme to search for the true maximum or even the approximate 
maximum. Instead, we use a greedy algorithm to find an orbit which gives a reasonable 
compression ratio, and which is easy to apply. 

The idea of the greedy algorithm is that at each iteration step we choose the pair of nodes 
i t and j t which maximizes R t -i(i,j) over all possible pairs, and we stop if the maximum 
value is non-positive. Also we need to record a and /3 according to the graph. 

Here we summarize the greedy algorithm as pseudocode: 

Given the adjacency matrix A of a graph (N nodes and M edges). 

Begin: 

Set A = A; 

Calculate Ro(i,j) for all i, j = 1, ...,N. This forms the redundancy matrix R = R. 
Set t=l. 

1. Let R t -i(it,jt) be the largest element in Rt-i- 
If Rt-i(i t ,j t )>0 

record a t = (i t ,j t ), 
then go to step 2. 
Else, 
End. 

2. Set (3 t according to the difference between the two rows of a t in At-i, 
Update A t -x to A t according to ( IPTj) ; 

Update Rt-i to R t according to (|22|) and (|23|) for i,j ^ j t ; 
Set Rt(i,j) = for i or j = j t . 

3. Set t = t + 1 and go to step 1. 
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The compressed version of the matrix will consist of: the final matrix At, the orbit 
(ail, ...,ar) and the vectors ...,/3t}, which will allow us to reconstruct A = A and any 
intermediate matrix A t during the compression process. 

V. EXAMPLES OF APPLICATION TO GRAPHS 

In this section we will show some examples of our compression scheme on several networks. 
We begin with the lattice graph, which is expected to be readily compressible due to the high 
degree of overlapping between neighbors of nodes. As a secondary example, we add some 
random alterations, and apply our method to the corresponding Watts-Strogatz network. 
Finally we show some results for real-world networks. 

A. A Simple Benchmark Example: Lattice Graph 

One of the most symmetric graphs is the lattice graph, a one-dimensional chain where 
each site is connected to | nearest neighbors to its right and left. In this case < k >= k 
represents the degree of each vertex in the lattice graph. The total number of nodes is 
N >>< k >, the corresponding adjacency matrix is sparse. 

We implement our algorithm for lattice graph with different < k >. The results are 
shown in Figured Here we take N = 500. 

B. Compressing a Watts-Strogatz Small- World Graph 

It is not surprising that the lattice graphs are easy to compress since these graphs are 
highly symmetric and nodes have lots of overlaps in their neighbors. However, in the case 
that we don't have such perfect symmetry, we still hope to achieve compression. Here we 
apply our algorithm to the WS graphs. The WS graph comes from the famous Watts- 
Strogatz model for real-world networks by showing the so called small-world phenomenon. 
The WS graph is generated from a lattice graph by the usual rewiring of each edge with 
some given probability p from the uniform distribution. 

We apply our algorithm to WS graphs with different p to explore how p affects the 
compression behavior. Results are shown in Figure El 
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Figure 7: Compression results for lattice graphs. Stars indicate the final compression ratios for 
the lattice graphs with < k > 2 to 40. The compression limit is indicated by the bottom curve 
given by rj k 
formula: r\ k - <^ 

Vk = 5 + <I>- F° r < k » 2, our algorithm always achieves a better result than the YSMF and 
the advantage increases with increasing < k >. 

C. Real-World Graphs 



, and we find that for < k > large the compression ratio is close to the empirical 
" (upper curve). For comparison, we plot the result using YSMF (broken line): 



In the following we show the compression results for some real world graphs: a Celegans 
metabolic network [ 10] (Figure [9]), a yeast network constructed from yeast reactions 
an email network and an airline network of flight connections [12 ] . In the table 1 we 
summarize the compression results for these real world graphs. 



Network 


N 


< k > 


Vy 


V 


V* 


Lattice 


N 


< k > 


1 + 1 

2 ^ <k> 


3 

<k> 


2 

<k> 


Yeast [9] 


2361 


6.08 


0.66 


0.50 


0.33 


Metabolic [10J 


453 


9.01 


0.61 


0.43 


0.22 


Email [11] 


1133 


9.62 


0.60 


0.49 


0.21 


Airline [12] 


332 


12.81 


0.58 


0.31 


0.16 



Table I: Compression results for some networks. 
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Figure 8: Compression results for WS graphs. Here the base lattice graph is with N = 500 and 
< k >= 40. The stars show the compression results by our algorithm. The lower line is the 
compression ratio for the lattice N = 500 and < k >= 40 and the upper line is the ratio from 
the YSMF. We see that as p increases there is less and less overlapping between neighbors in the 
network and the compression ratio increases. For p ~ 0.5, we obtain worse result than YSMF. 




200 




200 



Figure 9: Compression process for Metabolic network [lo| : compression ratio t] during each step 
(left), and information redundancy p each step (right). 

VI. DISCUSSION 



From the previous section we see that our algorithm works for various kinds of graphs and 
gives a reasonable result. The ideal limit of our method for a graph with N nodes, M edges 
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and average degree < k >= ^ff, which is relative large, is -^z- This is obtained when each 
Pt during the compression process is empty, meaning that most of the nodes share common 
neighbors, in which case we only need to record all the a t , requiring y units of information 
and yields 

" = Wt = < 30) 

Notice that trees do not compress, since for trees < k >= 2, so on average the overlap in 
neighbors will be even smaller (likely to be 0), and a possible way to achieve compression 
is by self-matching for large degree nodes, for example, the hubs in a star graph. For 
comparison, the YSMF always gives the compression ratio 

^-\ + Ti-> < 31 > 

which does not compress trees, and has a lower bound |, while our method in principle 
approaches as < k >^ oo. Actually the compression ratio using YSMF can be achieved 
by choosing a special orbit in our approach which only contains self-matches a, i.e. 

WL = {(M')}f=i. (32) 

In this case the neighbors of each node will be put into corresponding j3 sets and since any a.; 
contains the same pair of numbers (i, i) we can just use one i to represent the pair, resulting 
in a total N + M information units. So our approach can be considered as a generalization of 
the YSMF. 

However, as we observed in our compression results, the compression ratio given by [30] is 
in general not attainable since it is only achieved for the ideal case that nearly every node 
in the graph shares the same neighbors, and yet the graph needs to be sparse! However, for 
lattices we observe that the actual compression ratio achieved by our algorithm is about ^§^, 
which is of the same order as the ideal compression ratio. For WS graphs, when the noise p 
is small, our algorithm achieves better compression ratio than YSMF, and the compression 
ratio is nearly linearly dependent on p for p < 0.5. For p > 0.5 the graph resembles Erdos- 
Renyi random graphs 13), there is no symmetry between nodes to be used and thus our 
approach does not give good result, as compared to the YSMF. 

For real world graphs, the results by our algorithm are better than using YSMF, but not 
as good as we observed for lattice graphs. This suggests that in real world graphs nodes, 
in general, share certain amount of common neighbors even when the total number of links 
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is small. This kind of overlap in neighbors is certainly not as common as we see in lattice 
graphs since real world graphs in general have more complicated structures. 
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